Goto

Collaborating Authors

 property type


Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction

Vuong, An, Van, Minh-Hao, Verma, Prateek, Zhao, Chen, Wu, Xintao

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as materials science remains limited. While some machine learning methods have addressed specific challenges in this field, there is still a lack of foundation models designed for broad tasks like polymer property prediction using multimodal data. In this work, we present a multimodal polymer dataset to fine-tune VLMs through instruction-tuning pairs and assess the impact of multimodality on prediction performance. Our fine-tuned models, using LoRA, outperform unimodal and baseline approaches, demonstrating the benefits of multimodal learning. Additionally, this approach reduces the need to train separate models for different properties, lowering deployment and maintenance costs.


LLM Agents for Interactive Exploration of Historical Cadastre Data: Framework and Application to Venice

Karch, Tristan, Saydaliev, Jakhongir, Di Lenardo, Isabella, Kaplan, Frédéric

arXiv.org Artificial Intelligence

Cadastral data reveal key information about the historical organization of cities but are often non-standardized due to diverse formats and human annotations, complicating large-scale analysis. We explore as a case study Venice's urban history during the critical period from 1740 to 1808, capturing the transition following the fall of the ancient Republic and the Ancien Régime. This era's complex cadastral data, marked by its volume and lack of uniform structure, presents unique challenges that our approach adeptly navigates, enabling us to generate spatial queries that bridge past and present urban landscapes. We present a text-to-programs framework that leverages Large Language Models (\llms) to process natural language queries as executable code for analyzing historical cadastral records. Our methodology implements two complementary techniques: a SQL agent for handling structured queries about specific cadastral information, and a coding agent for complex analytical operations requiring custom data manipulation. We propose a taxonomy that classifies historical research questions based on their complexity and analytical requirements, mapping them to the most appropriate technical approach. This framework is supported by an investigation into the execution consistency of the system, alongside a qualitative analysis of the answers it produces. By ensuring interpretability and minimizing hallucination through verifiable program outputs, we demonstrate the system's effectiveness in reconstructing past population information, property features, and spatiotemporal comparisons in Venice.


Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type

Song, Seokwon, Lee, Taehyun, Ahn, Jaewoo, Sung, Jae Hyuk, Kim, Gunhee

arXiv.org Artificial Intelligence

Conceptual combination is a cognitive process that merges basic concepts, enabling the creation of complex expressions. During this process, the properties of combination (e.g., the whiteness of a peeled apple) can be inherited from basic concepts, newly emerge, or be canceled. However, previous studies have evaluated a limited set of properties and have not examined the generative process. To address this gap, we introduce the Conceptual Combination with Property Type dataset (CCPT), which consists of 12.3K annotated triplets of noun phrases, properties, and property types. Using CCPT, we establish three types of tasks to evaluate LLMs for conceptual combination thoroughly. Our key findings are threefold: (1) Our automatic metric grading property emergence and cancellation closely corresponds with human judgments. (2) LLMs, including OpenAI's o1, struggle to generate noun phrases which possess given emergent properties. (3) Our proposed method, inspired by cognitive psychology model that explains how relationships between concepts are formed, improves performances in all generative tasks. The dataset and experimental code are available at https://github.com/seokwon99/CCPT.git.


Let's Think Var-by-Var: Large Language Models Enable Ad Hoc Probabilistic Reasoning

Xia, Shepard, Lu, Brian, Eisner, Jason

arXiv.org Artificial Intelligence

A hallmark of intelligence is the ability to flesh out underspecified situations using "common sense." We propose to extract that common sense from large language models (LLMs), in a form that can feed into probabilistic inference. We focus our investigation on $\textit{guesstimation}$ questions such as "How much are Airbnb listings in Newark, NJ?" Formulating a sensible answer without access to data requires drawing on, and integrating, bits of common knowledge about how $\texttt{Price}$ and $\texttt{Location}$ may relate to other variables, such as $\texttt{Property Type}$. Our framework answers such a question by synthesizing an $\textit{ad hoc}$ probabilistic model. First we prompt an LLM to propose a set of random variables relevant to the question, followed by moment constraints on their joint distribution. We then optimize the joint distribution $p$ within a log-linear family to maximize the overall constraint satisfaction. Our experiments show that LLMs can successfully be prompted to propose reasonable variables, and while the proposed numerical constraints can be noisy, jointly optimizing for their satisfaction reconciles them. When evaluated on probabilistic questions derived from three real-world tabular datasets, we find that our framework performs comparably to a direct prompting baseline in terms of total variation distance from the dataset distribution, and is similarly robust to noise.


Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall

Yuan, Jiaqing, Pan, Lin, Hang, Chung-Wei, Guo, Jiang, Jiang, Jiarong, Min, Bonan, Ng, Patrick, Wang, Zhiguo

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue. In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.


DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal

Du, Wei-Wei, Wang, Wei-Yao, Peng, Wen-Chih

arXiv.org Artificial Intelligence

The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.


What's in the laundromat? Mapping and characterising offshore owned domestic property in London

Bourne, Jonathan, Ingianni, Andrea, McKenzie, Rex

arXiv.org Artificial Intelligence

The UK, particularly London, is a global hub for money laundering, a significant portion of which uses domestic property. However, understanding the distribution and characteristics of offshore domestic property in the UK is challenging due to data availability. This paper attempts to remedy that situation by enhancing a publicly available dataset of UK property owned by offshore companies. We create a data processing pipeline which draws on several datasets and machine learning techniques to create a parsed set of addresses classified into six use classes. The enhanced dataset contains 138,000 properties 44,000 more than the original dataset. The majority are domestic (95k), with a disproportionate amount of those in London (42k). The average offshore domestic property in London is worth 1.33 million GBP collectively this amounts to approximately 56 Billion GBP. We perform an in-depth analysis of the offshore domestic property in London, comparing the price, distribution and entropy/concentration with Airbnb property, low-use/empty property and conventional domestic property. We estimate that the total amount of offshore, low-use and airbnb property in London is between 144,000 and 164,000 and that they are collectively worth between 145-174 billion GBP. Furthermore, offshore domestic property is more expensive and has higher entropy/concentration than all other property types. In addition, we identify two different types of offshore property, nested and individual, which have different price and distribution characteristics. Finally, we release the enhanced offshore property dataset, the complete low-use London dataset and the pipeline for creating the enhanced dataset to reduce the barriers to studying this topic.


Artificial intelligence reveals top 30 investor suburbs

#artificialintelligence

A new research tool launched by buyer's agency network BuyersBuyers promises to take the guesswork out of suburb selection by using artificial intelligence to match a purchaser's budget with their best prospects for capital growth. BuyersBuyers co-founder Pete Wargent said the unique Where to Buy tool provided answers on which location and what sort of property would be the best choice for investors or owner-occupiers under a specific budget. "We've created a simple online process that improves the customer journey, and can help buyers to reduce time, cost and stress in their search," Mr Wargent said. The tool, which was developed in collaboration with RiskWise Property Research, assesses metrics including housing supply, median values, 12-month price growth and vacancy rates to determine whether the locations would provide risky or rewarding prospects for investment. RiskWise Property Research chief executive Doron Peleg said the new offering would complement a suite of research tools developed in conjunction with BuyersBuyers that were free for subscribers. "For example, for 2022, we ran a list of thirty suburbs which are expected to perform well for investors with a budget of up to around $1 million," Mr Peleg said.


WattScale is an open source AI tool that identifies energy-wasting homes

#artificialintelligence

Researchers at the University of Pittsburgh, University of Massachusetts Amherst, and Microsoft Research India have developed a system -- WattScale -- that leverages AI to pick out the least energy-efficient buildings from a city- or region-level population. In a preprint study, they used it to show that half of the buildings in a 10,000-building data set were inefficient, in large part due to poor construction. They also emit over a third of the nation's greenhouse gases, which is more than any other sector of the economy. Solving for the disparity requires identifying buildings that are the least efficient and thus have the greatest need for improvements, but approaches that rely on the age of a building or its total energy bill don't work well; greater energy usage doesn't necessarily point to inefficiencies. WattScale aims to address this with (1) a Bayesian modeling technique that captures variable distributions governing the energy usage of a building and (2) a fault analysis algorithm that makes use of these distributions to report probable causes of inefficiency.


Smarter Pricing for Airbnb Using Machine Learning

#artificialintelligence

You can find the files for this project at my GitHub and the slides here. The final project is accessible here (interactive web app).] I recently designed a new approach to automatic pricing for Airbnb listings using the Inside Airbnb dataset. I used linear regression to establish a base price and time series analysis to forecast price fluctuations due to the date. I used unsupervised learning to build a recommender system so hosts could compare their listing to other similar popular listings.